Object detection device, object detection server, and object detection method

ABSTRACT

A tag communication section ( 12 ) receives tag information transmitted from an information tag ( 11 ) attached to a person (P) to be detected. An attribute lookup section ( 15 ) looks up an attribute storage section ( 16 ) using ID included in the tag information to obtain attribute information such as the height of the person (P). A target detection section ( 14 ) specifies the position, posture and the like of the person (P) in an image obtained from an imaging section ( 13 ) using the attribute information.

TECHNICAL FIELD

[0001] The present invention relates to a technology of detecting theaccurate position, posture and the like of a given detection target,such as a person and an object, in an image.

BACKGROUND ART

[0002] Patent literature 1 below discloses a technology of controllingthe orientation and zoom of a monitor camera in response to an approachof a transmitter. Specifically, a radio transmitter transmitting an IDcode is attached to a person to be monitored. An antenna for detectingan approach of a transmitter is placed in an off-limits area. Once anapproach of a transmitter is detected via the antenna, a monitor cameracapable of capturing the surroundings of the antenna is automaticallyselected among a plurality of monitor cameras, and an image taken withthe selected camera is displayed on a monitor. In addition, the ID codetransmitted from the transmitter is read via the antenna, and based onthe height of the person associated in advance with the ID code, theorientation and zoom of the monitor camera are determined.

[0003] (Patent Literature 1) Japanese Laid-Open Patent Publication No.9-46694

[0004] Problems to be Solved

[0005] In recent years, with proliferation of the Internet, use of amonitoring system has started in which a monitor camera is connected tothe Internet to enable transmission of images taken with the monitorcamera. Such a system costs low and is easy in placement of a monitorcamera, compared with a system using an exclusive line. In even such asystem, however, an operator for monitoring images is still necessary.In the future, therefore, a technology permitting not only automaticcapturing of a target to be monitored but also automatic retrieval ofuseful information from images is desired.

[0006] Also, with the recent advance of the robot technology,achievement of a home robot for assisting human lives is expected. Sucha robot must have the function of detecting the situation surroundingitself and acting in harmony with the surrounding situation. Forexample, to properly move in a house and work for a person and anobject, a robot must detect the position, posture and motion of a personand an object surrounding itself accurately. Otherwise, the robot willnot be able to move and work accurately, much less to assist humanlives.

[0007] With proliferation of portable cameras such as hand-held videocameras, digital cameras and camera-equipped mobile phones, it isincreasingly desired that even an unskilled user could photograph asubject properly. For this, it is important to detect the position andthe like of a target accurately.

[0008] However, the prior art described above finds difficulty inresponding to the needs described above. Specifically, the above priorart merely selects a camera capturing a detection target in response toan approach of a transmitter. The prior art falls short of acquiringdetailed information such as where the detection target is in a capturedimage and what posture the target takes. In addition, with use of anantenna to locate the position, a comparatively large error (severalmeters to high-teen meters) occurs in position information. Therefore,if another person is present near the detection target, it is difficultto distinguish one from the other in an image.

[0009] There are conventionally known some technologies of detecting anobject from a camera image by only image processing. However, most ofthese technologies cause frequent occurrence of miss detection in theenvironment in which humans normally live, although being usable undervery restricted conditions, and thus application of these technologiesis difficult. Reasons for this are that the dynamic range of a cameraitself is limited, that various objects other than the detection targetand the background exist, and that the image of one target at the sameposition may change in various ways with change of sunshine andillumination.

[0010] The visual mechanism of the human can detect a target accuratelyeven in an environment having a large change by utilizing an enormousamount of knowledge and rules acquired from experience. Currently, it isgreatly difficult to incorporate knowledge and rules as those acquiredby the human in equipment. In addition, an enormous processing amountand memory amount will be necessary to achieve such processing, and thiswill disadvantageously increase the processing time and the cost.

[0011] In view of the above problems, an object of the present inventionis providing an object detection technology permitting precise detectionof the position, posture and the like of a detection target in an imagewithout requiring an enormous processing amount.

DISCLOSURE OF THE INVENTION

[0012] The object detection equipment of the present invention includes:an imaging section for taking an image; a tag communication section forreceiving tag information transmitted from an information tag attachedto a given target; and a target detection section for detecting thegiven target in the image taken by the imaging section using the taginformation received by the tag communication section.

[0013] Accordingly, the target detection section uses tag informationtransmitted from an information tag attached to a given target indetection of the given target in an image taken by the imaging section.That is, information unobtainable from the image can be obtained fromthe tag information or retrieved based on the tag information, and suchinformation can be utilized for image processing. Therefore, a giventarget can be detected precisely from an image without requiring anenormous processing amount.

[0014] In the object detection equipment of the present invention,preferably, the tag information includes attribute informationrepresenting an attribute of the given target, and the target detectionsection performs the detection using the attribute information includedin the tag information received by the tag communication section.

[0015] In the object detection equipment of the present invention,preferably, the tag information includes ID information of the giventarget, the object detection equipment further includes: an attributestorage section for storing a correspondence between ID information andattribute information; and an attribute lookup section for looking upcontents of the attribute storage section using the ID informationincluded in the tag information received by the tag communication, toobtain attribute information of the given target, and the targetdetection section performs the detection using the attribute informationobtained by the attribute lookup section.

[0016] Preferably, the target detection section includes: an imagesegmentation portion for determining a partial image area having thepossibility of including the given target in the image; and an imagerecognition portion for detecting the given target in the partial imagearea determined by the image segmentation portion, and at least one ofthe image segmentation portion and the image recognition portionperforms processing by referring to the attribute information.

[0017] In the object detection equipment of the present invention,preferably, the tag information includes position informationrepresenting a position of the information tag, and the target detectionsection performs the detection by referring to the position informationincluded in the tag information received by the tag communicationsection.

[0018] In the object detection equipment of the present invention,preferably, the tag communication section estimates a position of theinformation tag from a state of reception of the tag information, andthe target detection section performs the detection by referring to theposition estimated by the tag communication section.

[0019] In the object detection equipment of the present invention,preferably, the tag information includes a detection procedure for thegiven target, and the target detection section performs the detection byexecuting the detection procedure included in the tag informationreceived by the tag communication section.

[0020] In the object detection equipment of the present invention,preferably, the target detection section performs the detection withonly image processing without use of the tag information when areception state of the tag communication section is bad.

[0021] The object detection server of the present invention receives animage taken by an imaging section and tag information transmitted froman information tag attached to a given target, and detects the giventarget in the image using the tag information.

[0022] The object detection method of the present invention includes thesteps of: receiving an image taken by an imaging section; receiving taginformation transmitted from an information tag attached to a giventarget; and detecting the given target in the image using the taginformation.

BRIEF DESCRIPTION OF THE DRAWINGS

[0023]FIG. 1 is a block diagram showing a configuration of objectdetection equipment of Embodiment 1 of the present invention.

[0024]FIG. 2 is a block diagram conceptually showing an innerconfiguration of an information tag in Embodiment 1 of the presentinvention.

[0025]FIG. 3 is a flowchart showing a flow of processing of objectdetection in Embodiment 1 of the present invention.

[0026]FIG. 4 is a flowchart showing an example of processing in step S4in FIG. 3.

[0027]FIG. 5 is a view showing a situation of the object detection inEmbodiment 1 of the present invention.

[0028]FIG. 6 shows an example of an image taken by an imaging section inthe situation of FIG. 5.

[0029]FIG. 7 is a view showing candidate areas determined by an imagesegmentation portion in the image of FIG. 6.

[0030]FIG. 8 is a view showing a candidate area determined usingposition information obtained from the information tag, in the image ofFIG. 6.

[0031]FIG. 9 is a view showing an area determined from the candidateareas shown in FIGS. 7 and 8.

[0032]FIG. 10 shows an example of information used for generation of atemplate.

[0033]FIG. 11 is a view showing motion trajectories obtained from animage.

[0034]FIG. 12 is a view showing a motion trajectory based on theposition information obtained from the information tag.

[0035]FIG. 13 is a view showing an example of placement of a pluralityof cameras.

[0036]FIG. 14 is a flowchart showing a method of determining coordinatetransformation of transforming spatial position coordinates of adetection target to image coordinates by camera calibration.

[0037]FIG. 15 is a view showing a situation of object detection inEmbodiment 2 of the present invention.

[0038]FIG. 16 shows an example of an image taken by an imaging sectionin the situation of FIG. 15.

[0039]FIG. 17 is a view showing candidate areas determined by an imagesegmentation portion in the image of FIG. 16.

[0040]FIG. 18 is a view showing a candidate area determined usingposition information obtained from an information tag in the image ofFIG. 16.

[0041]FIG. 19 is a view showing an area determined from the candidateareas shown in FIGS. 17 and 18.

[0042]FIG. 20 is a view showing a situation of object detection inEmbodiment 3 of the present invention.

[0043]FIG. 21 is a block diagram showing a configuration example of acamera 50 in FIG. 20.

[0044]FIG. 22 is a flowchart showing a flow of processing in Embodiment3 of the present invention.

[0045]FIG. 23 shows an example of an image obtained during theprocessing in FIG. 22.

[0046]FIG. 24 is an example of a human-shaped template used during theprocessing in FIG. 22.

[0047]FIG. 25 is a flowchart showing an example of procedure ofswitching of the detection processing.

BEST MODE FOR CARRYING OUT THE INVENTION

[0048] Hereinafter, embodiments of the present invention will bedescribed with reference to the drawings. Note that a component commonin a plurality of drawings is denoted by the same reference numeral, andthe detailed description thereof may be omitted in some cases.

[0049] (Embodiment 1)

[0050]FIG. 1 is a block diagram showing a configuration of objectdetection equipment of Embodiment 1 of the present invention. In theconfiguration of FIG. 1, a person P is a detection target. Referring toFIG. 1, a tag communication section 12 receives tag informationtransmitted from an information tag 11 attached to the person P. Animaging section 13 takes images. A target detection section 14 detectsthe person P in the image taken by the imaging section 13 using the taginformation received by the tag communication section 12. The imagingsection 13 is placed at a position at which images including the personP can be taken.

[0051]FIG. 2 is a block diagram conceptually showing an internalconfiguration of the information tag 11. Referring to FIG. 2, acommunication portion 110 communicates with the tag communicationsection 12 in a noncontact manner via a radio wave, an acoustic wave,light or the like, and sends predetermined tag information. A storageportion 111 stores attribute information (height, age, sex and thelike), ID information and the like of the person P to which theinformation tag 11 is attached, for example, as tag information. Aposition detection portion 112 detects the position of the informationtag 11 with a positioning system using an artificial satellite such asthe Global Positioning System (GPS), for example, and outputs the resultas tag information. The information tag 11 transmits tag informationoutput from its storage portion 111 and position detection portion 112via its communication portion 110.

[0052] The information tag 11 may be attached to a mobile phone carriedby the person P, for example. Both the storage portion 111 and theposition detection portion 112 may be used, or either one of them may beused. As a positioning system for providing the position informationother than the GPS, a system based on measurement of the distance of amobile phone or a PHS (Personal Handy-phone System) from its basestation, for example, may be used.

[0053] The configuration of FIG. 1 also includes: an attribute storagesection 16 for storing the correspondence between ID information andattribute information; and an attribute lookup section 15 for obtainingattribute information of the person P by looking up the stored contentsof the attribute storage section 16 using the ID information included inthe tag information received by the tag communication section 12.

[0054] The ID information as used herein refers to a code or a markassociated in advance with each target. The ID information may be givenfor each individual as the target or for each category the targetbelongs to. For example, in the case that the target is a person as inthis embodiment, ID information unique to each person may be given, orID information unique to each family and common to the members of thefamily may be given. In the case that the target is an object such as apen, for example, ID information unique to each pen may be given, or IDinformation unique to each color and shape and common to pens having thesame color and shape may be given.

[0055] The target detection section 14 detects the person P from imagestaken by the imaging section 13 using the tag information received bythe tag communication section 12 and/or the attribute informationobtained by the attribute lookup section 15. The target detectionsection 14 includes an image segmentation portion 141 and an imagerecognition portion 142.

[0056] The image segmentation portion 141 determines a partial imagearea having the possibility of including the person P in an image takenby the imaging section 13. When the GPS or the like is used fordetection of the position of the information tag 11 as described above,a large error is less likely to occur. However, it is still difficult toattain high-precision position detection and acquire detailedinformation such as the posture and face position of a person. This isdue to theoretical limitation of the precision of a sensor, influence ofan error and disturbance in the actual use environment, actuallimitation of the number of sensors usable and other reasons. As for theimage processing, the partial image area can be determined withcomparatively high precision under idealistic conditions (situationwhere the illumination little varies or changes and only a limitedobject is captured, for example). However, miss detection is likely tooccur in the situation where various objects exist and the illuminationvaries, as in images taken outdoors.

[0057] The image segmentation portion 141 uses the tag information andthe attribute information in integration with the image, so that thepartial image area can be determined with higher precision. A pluralityof partial image areas may be determined. Such an area may not bedetected if there is no possibility of existence of a detection target.

[0058] The image recognition portion 142 detects whether or not theperson P exists in the partial image area determined by the imagesegmentation portion 141 and, if the person P exists, detects theposition, posture and motion of the person P. As described above, alarge error is less likely to occur in the position information from theinformation tag 11, but it is difficult to acquire high-precisioninformation and detailed information from the position information. Asfor use of only image processing, it is difficult to detect unspecifiedmany persons with high precision. In view of these, the imagerecognition portion 142 uses the height of the person P, for example, asthe tag information or the attribute information, and this improves theprecision of image recognition processing. For example, when templatematching is adopted as the recognition technique, the size of a templatemay be set according to the height of the person P. This improves thedetection precision, and also reduces the processing amount because thetemplate used for the recognition processing can be narrowly defined.

[0059] As described above, in the configuration of FIG. 1, by using thetag information received by the tag communication section 12 and theattribute information obtained by the attribute lookup section 15 inintegration with an image, the person P as a given target can bedetected in the image with high precision while suppressing increase ofthe processing amount.

[0060]FIG. 3 is a flowchart showing a flow of object detectionprocessing in this embodiment. In this flow, the processing using animage (S2, S3 and S4) and the information acquiring processing using aninformation tag (S5, S6 and S7) may be performed in parallel with eachother. Referring to FIG. 3, first, an instruction of detection of agiven target is issued (S1). The target may be designated by the user ofthe system or by a person to be detected himself or herselfAlternatively, the target may be automatically designated according tothe time and the place, or all of targets of which tag information canbe obtained from respective information tags may be designated.

[0061] Thereafter, a camera supposed to capture the detection target isspecified (S2). When a plurality of cameras are used as will bedescribed later, all of cameras capable of capturing the target areselected. In this example, a camera having the possibility of capturingthe detection target may be designated in advance, or a camera may beselected using the position information of an information tag obtainedin step S7 to be described later. Images are then obtained from thecamera specified in the step S2 (S3).

[0062] An information tag associated with the designated target isspecified (S5). Tag information is then obtained from the specifiedinformation tag (S6), and information on the target is acquired from theobtained tag information (S7). The acquired information includes theattribute information such as the height, age and sex read from theattribute storage section 16 and the position information and the likeincluded in the tag information, for example. Finally, detailedinformation such as the position of the target is specified in the imageobtained in the step S3 using the information acquired in the step S7(S4).

[0063]FIG. 4 is a flowchart showing an example of processing in the stepS4. First, motion vectors in an image are computed (S41). A set M ofimage areas within which the directions of vectors are the same in theimage is determined using the motion vectors (S42). An area of which theposition and motion match with the position and motion of the targetacquired from the information tag is then selected from the image areaset M as a set Ms (S43). A template H in the shape of a human is thengenerated to reflect the attributes such as the height, age and sexacquired from the information tag (S44). Shape check using thehuman-shaped template H generated in the step S44 is conducted in theimage area set Ms (S45). Finally, the position giving the highest degreeof matching in the shape check in the step S45 is detected as theposition of the target (S46).

[0064] The processing in this embodiment will be described in moredetail with reference to FIGS. 5 to 9.

[0065] In FIG. 5, the tag communication section 12 is in a communicationcenter 30, while the target detection section 14, the attribute lookupsection 15 and the attribute storage section 16 are in a monitoringcenter 31. Assume herein that a given person Pa carrying an informationtag 11 (attached to a mobile phone 33) is detected with a camera(imaging section 13) placed outdoors on the side of a street. Theinformation tag 11 transmits ID information of the person Pa and roughGPS position information (error of about 10 m). In the communicationcenter 30, the tag communication section 12 receives the informationtransmitted from the information tag 11. The monitoring center 31 as theobject detection server obtains an image from the camera 13 and the taginformation received in the communication center 30 via a communicationnetwork. The attribute storage section 16 stores attribute informationsuch as the height, age and sex associated with ID information.

[0066] Assume that currently a person Pb and a car C exist near theperson Pa and that the person Pa is moving toward the imaging section 13while the person Pb and the car C are moving away from the imagingsection 13. At this moment, an image as shown in FIG. 6 is taken withthe camera 13, in which the persons Pa and Pb and the car C arecaptured.

[0067] First, the image segmentation portion 141 of the target detectionsection 14 executes the steps S41 to S43 shown in FIG. 4, to determine apartial image area having the possibility of including the person Pa.

[0068] Specifically, motion vectors in the image of FIG. 6 are computed,to determine image areas within which the directions of vectors are thesame. FIG. 7 shows the areas obtained in this way. In FIG. 7, areas APa,APb and AC are formed at and around the positions of the persons Pa andPb and the car C, respectively.

[0069] An area of which the position and motion match with the positionand motion of the person Pa obtained from the tag information is thenselected from the areas shown in FIG. 7. FIG. 8 shows a candidate areadetermined when only the position information obtained from theinformation tag 11 is used. In FIG. 8, the position specified by theposition information is transformed to the position on the camera image,which is shown as an area A1. Since the position information from theGPS includes an error, a large area including the persons Pa and Pb isgiven as the candidate area A1 considering a possible error. From FIGS.7 and 8, the overlap areas, that is, the areas APa and APb areconsidered as candidate areas. Thus, the area AC of the car C can beexcluded by using the position information of the person Pa.

[0070] Further, from a change in position information, it is found thatthe target is moving forward. Therefore, by checking the directions ofthe motion vectors in the areas APa and APb, the candidate area can benarrowed to only the area APa as shown in FIG. 9.

[0071] Thereafter, the image recognition portion 142 of the targetdetection section 14 executes the steps S44 to S46 in FIG. 4, to specifythe position and posture of the person Pa in the image. That is,template matching is performed for the area APa to detect the positionand motion of the person Pa more accurately.

[0072] Use of an unnecessarily large number of templates will increasethe possibility of erroneously detecting a person/object different fromthe detection target, and also will require a vast amount of processingfor matching. Therefore, the template used for the matching is narrowlydefined according to the attribute information such as the height, ageand sex obtained by the attribute lookup section 15. In this way, thedetection precision can be improved, and the processing amount can bereduced.

[0073] An example of how the template is narrowly defined will bedescribed. FIG. 10 shows an example of information used for generationof templates, in which the height range, the average body type and theaverage clothing size are shown for each of age brackets (child, adult,senior). Assume that the age of the person Pa is found to be “20” fromthe attribute information. Since this age belongs to the age bracket“adult”, the average height and body type can be obtained from therelationship shown in FIG. 10. The obtained height and body type aretransformed to the size and shape on the camera image, to therebygenerate a shape template used for the matching.

[0074] When the height range has a certain width, a plurality oftemplates different in size may be generated within the height range. Ifthe exact height of a target is directly obtainable as attributeinformation, this height value may be used. If the information tag isattached to the clothing of the target and the size of the clothing isobtained as tag information, the height of the target can be obtainedfrom the relationship shown in FIG. 10, to generate a template.

[0075] Once the accurate position of the person Pa in the area APa isdetected from the template matching, the change of the position withtime may be followed, to obtain the accurate motion of the person Pa. Ifshape templates of different postures are used, the posture of theperson Pa can also be detected. Each template used for the matching maybe a gray-scale image or a binary image representing the shape or acolor image including a color, or else may be an image merelyrepresenting the outline.

[0076] Once the accurate position of the person Pa is successfullydetected in the image, the camera can zoom in on the person Pa tocapture only the person Pa, for example. Alternatively, the zooming ratemay be lowered so that the surroundings of the person Pa are captured atall times. This enables continuous monitoring of not only the state ofthe person Pa but also how the person Pa comes into contact withneighboring persons. This will be useful for crime investigation,behavior investigation and the like.

[0077] As described above, in this embodiment, in detection of a giventarget Pa in an image, tag information transmitted from the informationtag 11 attached to the target Pa is used. In this way, information thatwill not be obtainable from the image can be obtained from the taginformation itself (for example, attribute information and positioninformation transmitted from the information tag 11) or can be retrievedusing the tag information (for example, attribute information read fromthe attribute storage section 16), and such information can be utilizedin image processing. This enables detection of a given target in animage without requiring an enormous processing amount.

[0078] In this embodiment, both the image segmentation portion 141 andthe image recognition portion 142 refer to the information from the tagcommunication section 12 and the attribute lookup section 15. Instead,either one of the image segmentation portion 141 and the imagerecognition portion 142 may refer to the information from the tagcommunication section 12 and the attribute lookup section 15, and theother may perform the processing using only the image information.

[0079] In this embodiment, the information tag 11 is attached to amobile phone carried by the person Pa. Alternatively, the informationtag 11 may be attached to other portable equipment such as a PDA. Theinformation tag 11 may otherwise be attached to something carried by atarget person such as a stick and a basket, something moving togetherwith a target person such as a wheelchair and a shopping cart, orsomething worn by a target person such as clothing, glasses and shoes.The information tag 11 may even be embedded in the body of a targetperson.

[0080] In this embodiment, the information tag 11 transmits attributeinformation stored in the storage portion 111 and position informationdetected by the position detection portion 112. The information to betransmitted is not limited to these. For example, the information tag 11may transmit information on the motion of the detection target obtainedusing an acceleration sensor, a compass, a gyro device and the like, ormay transmit information on the posture of the detection target obtainedusing a gyro device and the like. In the case of transmission of theposture, the template used for the recognition by the image recognitionportion 142 can be limited to a shape template having a specificposture. This enables further improvement in detection precision andreduction in processing amount.

[0081] The attribute information stored in the information tag 11 andthe attribute storage section 16 is not limited to that described inthis embodiment. For example, when a person is a target to be detected,the skin color and the hair color may be stored as attributeinformation. The image segmentation portion 141 may detect an areahaving a specific color, or the image recognition portion 142 may use atemplate reflecting a specific skin color or hair color. By thisprocessing, improvement in detection precision is expected.

[0082] History information on a detection target, such as the detectiontime, the detection position, the motion speed and the clothing, may bestored as attribute information. By comparing such history informationwith information currently obtained, whether or not the detection resulthas an abnormality can be determined.

[0083] The detection target is not limited to a person, but may be a petor a car, for example. When a pet is to be detected, an information tagmay be attached to a collar and the like. By capturing with a cameraindoors, detailed images of a pet can be obtained and the behavior ofthe pet can be grasped from a distant place. In this case, if the stateof a pet is to be checked with a small display screen such as that of amobile phone, an image covering the entire room will fail to give agrasp of the state of the pet. According to the present invention, inwhich the position of the pet in an image can be accurately detected,only an image area including the pet can be displayed on the smalldisplay screen. Thus, the state of the pet can be easily grasped.

[0084] In the case of monitoring cars with an outdoor monitoring systemand the like, information tags may be attached to cars in advance. Withthe information tag, the position of a specific car can be detectedaccurately in a monitoring image, and thus an image of the driver of thecar can be easily acquired automatically, for example. This can be usedfor theft prevention and the like.

[0085] <Use of Motion Trajectory>

[0086] In the example described above, in the integrated use of theimage information and the information obtained from the information tag,the candidate area was narrowed using the position and motion directionof the detection target. The present invention is not limited to this,but the motion trajectory, for example, may be used, as will bedescribed below.

[0087] Assume in this case that, in the environment shown in FIG. 5, thecar C does not exist and both the person Pa carrying the information tag11 and the person Pb are walking toward the camera 13 although thewalking trajectories are different between the persons Pa and Pb.

[0088]FIG. 11 shows an image taken with the camera 13 in the abovesituation. In FIG. 11, motion trajectories TPa and TPb obtained fromimages are shown by the solid arrows. Although the image processing canprovide detailed motion trajectories, it finds difficulty indistinguishing the person Pa from the person Pb.

[0089]FIG. 12 shows a motion trajectory T11 based on the positioninformation obtained from the information tag 11. With low precision ofthe position information, the error range of the position is representedby the width of the arrow in FIG. 12. Although the precision is low, theposition information can provide the outline of the motion trajectory.

[0090] The motion trajectory T11 in FIG. 12 is compared with thetrajectories TPa and TPb in FIG. 11 to determine the similarity. In theillustrated example, the trajectory TPa is higher in the similarity tothe motion trajectory T11. Therefore, the person Pa is specified as thedetection target, and the accurate position of the person Pa in theimage is obtained.

[0091] As described above, by use of the similarity of motiontrajectories, persons and objects comparatively similar in position andmotion direction to each other can be distinguished from each other. Thesimilarity of motion trajectories can be determined by calculating theproportion of the range of overlap of the trajectories, comparing thelengths of the trajectories, comparing the positions at which thetrajectories change the direction, or comparing motion vector series,for example.

[0092] <Use of a Plurality of Cameras>

[0093] Although one camera was used in the example described above, itis needless to mention that a plurality of cameras may be used. Forexample, as shown in FIG. 13, in monitoring in an out-of-sight place, aplurality of cameras C1 to C3 may be placed to prevent existence of ablind spot. In a cranked path as shown in FIG. 13, a person P will falloutside the image taken with the camera C1 only by moving a few stepsrightward. Therefore, even with the camera placement free from a blindspot, if the accurate position of the person P is unknown, it will bedifficult to select a suitable camera to follow the person P. Byadopting the present invention, an out-of-sight place can be monitoredover a wide area, and the position of the person P can be specified inimages.

[0094] In other words, a camera capable of capturing the person P can bespecified based on the position information obtained from theinformation tag. When the person P is at the position shown in FIG. 13,a camera image giving the largest figure of the person P, out of theimages from the cameras C1 and C2, can be displayed automatically bydetecting the person P using the tag information.

[0095] <Linking of Position Information of Information Tag with ImageCoordinates>

[0096] For realization of the present invention, the position indicatedby the position information of an information tag 11 must be linked inadvance with coordinates in a camera image. This will be describedbriefly.

[0097] The position information of an information tag 11 is linked withimage coordinates using a coordinate transformation T for transformingposition coordinates (world coordinates) in the three-dimensional spacein which a detection target exists to coordinates in a camera image. Bydetermining the coordinate transformation T in advance, it is possibleto link the position coordinates of an information tag with coordinatesin an image.

[0098] In general, the coordinate transformation T may be theoreticallycomputed based on the layout of a camera (the focal distance of a lens,the lens distortion characteristic, the size and number of pixels of animaging device) and the conditions of placement of the camera (theposition and posture of the camera), or may be determined with aprocedure of camera calibration to be described later. When the cameralayout and the camera placement conditions are known, the coordinatetransformation T can be determined by doing a combined calculationincluding geometric transformation and the like. When the camera layoutand the camera placement conditions are not known, the coordinatetransformation T can be determined by camera calibration.

[0099] A method for determining the coordinate transformation T bycamera calibration will be described with reference to the flow shown inFIG. 14. Assume in this description that the position, posture and zoomof the camera are fixed. First, at least six sets of positioncoordinates in a three-dimensional space in which a detection targetexists (world coordinates) and the corresponding position coordinates inan image (image coordinates) are prepared (E11). A linear transformationsatisfying the correspondence of the sets of the coordinates prepared inthe step E11 is determined by the method of least squares and the like(E12). Parameters of the linear transformation computed (cameraparameters) are stored (E13).

[0100] The stored camera parameters may be used for subsequenttransformation of the position coordinates of an information tag toimage coordinates.

[0101] When the position, posture and zoom of the camera is changed, acoordinate transformation for the state after the change may be preparedagain by the camera calibration. If a sensor is separately provided todetect the position, posture and zoom (focal distance of a lens) of acamera, new parameters can be determined by calculation. For a camera ofwhich the position and posture frequently change, such as a cameraplaced in a mobile unit like a robot and a car, it is desirable todetect the position and posture of the camera with a separate sensor,and determine camera parameters by calculation every time a change isdetected.

[0102] (Embodiment 2)

[0103] In Embodiment 2 of the present invention, assume that acamera-equipped movable robot detects an object as a given detectiontarget.

[0104]FIG. 15 shows a situation in this embodiment. Referring to FIG.15, a movable robot 40 is placed on a floor FL in a house. The robot 40as object detection equipment includes a camera 13 as the imagingsection, as well as the tag communication section 12, the targetdetection section 14 and the attribute lookup section 15 described inEmbodiment 1. The attribute storage section 16 is placed at a positiondifferent from the robot 40 so that the attribute lookup section 15 canrefer to attribute information stored in the attribute storage section16 via radio communication.

[0105] A cylindrical object Oa in a fallen state and a spherical objectOb, which are both red, are on the floor FL. An information tag 11 isembedded in each of the objects Oa and Ob and transmits ID informationas tag information. Antennas 43 a to 43 d are placed at the four cornersof the floor FL, to allow the information transmitted from theinformation tag 11 to be received by the tag communication section 12via an antenna 42 of the robot 40 by way of the antennas 43 a to 43 d.The attribute lookup section 16 reads the shape and color of the objectfrom the attribute storage section 16 as attribute information. The tagcommunication section 12 estimates a rough position of the informationtag 11 from the ratio of the reception intensity among the antenna 43 ato 43 d placed at the four corners.

[0106] Assume that the robot 40 moves, catches hold of the object Oa orOb as a given detection target with its hand and moves the object. Tocatch hold of an object with a hand 41, the robot 40 must accuratelydetect the position, shape and orientation of the object.

[0107]FIG. 16 shows an image taken with the camera 13 in the situationshown in FIG. 15. Existence of two kinds of objects is recognized by theID information from the information tags 11. Rough positions of theinformation tags 11 are estimated from the ratio of the receptionintensity among the antennas 43 a to 43 d. In this case, therefore, theID information and the radio intensity are used as information from theinformation tag 11.

[0108] Assume that as for one of the two kinds of objects, the objectOa, for example, attribute information that the shape is cylindrical andthe color is red has been obtained by looking up the attribute storagesection 16 using the ID information. In this case, the imagesegmentation portion 141 (not shown in FIG. 15) of the target detectionsection 14 determines a candidate area of being red in an image based onthe attribute information that the detection target is red. FIG. 17 isan image showing the result of the determination, in which two candidateareas BOa and BOb respectively corresponding to the red objects Oa andOb are shown.

[0109] A candidate area B1 as shown in FIG. 18 is obtained based on theposition information of the object Oa estimated from the receptionintensity. The candidate areas BOa and BOb in FIG. 17 and the candidatearea B1 in FIG. 18 are integrated to obtain an area B2 as shown in FIG.19. In this way, the position of the object Oa is accurately obtained.

[0110] The image recognition portion 142 (not shown in FIG. 15) of thetarget detection section 14 generates shape template images havingvarious orientations a cylindrical object can take, based on theattribute information that the object is cylindrical. Template matchingis performed for the image of the area B2 using these shape templates,and the orientation (posture) of the object can be accurately determinedfrom the orientation of an object corresponding to the shape templategiving the highest degree of matching.

[0111] The position and posture of the other object Ob can also beaccurately detected in the manner described above. As a result, therobot 40 can obtain information necessary for moving the objects Oa andOb with the hand 41. In other words, with the color and shape obtainedas attribute information, image segmentation and image recognition canbe performed for a specific color and shape, and in this way, highlyprecise, efficient object detection can be realized.

[0112] As described above, by effectively using information obtainedfrom an information tag 11 attached to an object and information of animage, the position and posture of the object can be detected with highprecision. Moreover, the processing amount required for the detectioncan be kept low. This makes it possible to realize object detectionrequired for a robot to work in a complicate environment such as in aroom of a house.

[0113] Accurate and real-time processing is desired in detection of atarget by a robot. Therefore, a technology permitting improvement indetection precision while keeping the processing amount from increasingas that according to the present invention is effective. For example,for robots used for rescue operation, care and the like, a delay ofprocessing may possibly lead to threatening of a human life or injury.Also, assume a case that a robot is intended to come into contact with aspecific person among a number of persons. Even if the specific personis successfully specified accurately, the person will pass by the robotwhile the robot is still executing the processing if the processing timeis excessively long. In such a case, also, by adopting the presentinvention, the target can be detected in an image swiftly andaccurately.

[0114] In the example described above, the relationship between the IDinformation and the attribute information was stored in the attributestorage section 16. Alternatively, the attribute information may bedirectly stored in the information tag 11 attached to an individualobject. By this direct storage, the attribute storage section and theattribute lookup section can be omitted, and thus the system layout canbe simplified. On the contrary, when the attribute storage section 16 isprovided, the memory capacity of the information tag 11 can be smalleven when the information amount used for detection increases. Thisenables reduction of the size and cost of the information tag 11. Also,the communication capacity between the information tag 11 and the tagcommunication section 12 can be kept from increasing.

[0115] A detection procedure itself may be recorded in the informationtag 11. For example, in the illustrated example, information on adetection procedure such as “detecting a red object” and “producingcylindrical shape templates and performing shape matching” may betransmitted from the information tag 11 attached to the object Oa.Alternatively, information on a detection procedure may be stored in theattribute storage section 16 in association with the ID information, soas to be read from the attribute storage section 16 based on the IDinformation received from the information tag 11. In these cases, thetarget detection section 14 simply executes processing according to thedetection procedure received from the information tag 11 or read fromthe attribute storage section 16. This can simplify a detection programinstalled in the robot itself. Moreover, even when a different kind ofobject is additionally provided, no change is necessary for thedetection program installed in the robot, and thus the maintenance canbe simplified.

[0116] (Embodiment 3)

[0117] In Embodiment 3 of the present invention, a person as a subjectis detected with a portable camera as a given detection target. Theportable camera as used herein includes a hand-held video camera, adigital camera, a camera-equipped mobile phone and an informationterminal.

[0118]FIG. 20 is a view showing a situation in this embodiment, in whicha person Pd is photographed with a portable camera 50 having the imagingsection 13 outdoors. The person Pd carries an information tag 11, whichtransmits ID information specifying the person Pd as tag information viaultrasonic wave. When an ultrasonic wave transmitter as that generallyused for distance measurement is used, the detection range is about 20 mfrom the camera, although it varies with the transmission intensity ofthe ultrasonic wave.

[0119] The camera 50 includes two microphones 51 a and 51 b, a tagcommunication section 12A and a target detection section 14A. Themicrophones 51 a and 51 b receive ultrasonic wave transmitted by theinformation tag 11. The tag communication section 12A obtains IDinformation from the ultrasonic signal received by the microphones 51 aand 51 b, and also computes the direction and distance of theinformation tag 11 with respect to the camera 50. The direction of theinformation tag 11 with respect to the camera 50 can be estimated fromthe time difference (phase difference) or the intensity ratio betweenultrasonic signals received by the two microphones 51 a and 51 b. Thedistance from the information tag 11 can be estimated from the degree ofattenuation (degree at which the intensity and the waveform dulls) ofthe received ultrasonic signals.

[0120]FIG. 21 is a block diagram showing a configuration example of thecamera 50 as the object detection equipment. Referring to FIG. 21, thetag communication section 12A is essentially composed of the microphones51 a and 51 b, a distance determination portion 52, a time differencecomputation portion 53, a direction determination portion 54 and an IDextraction portion 55. The object detection section 14A is essentiallycomposed of an image coordinate determination portion 56 and an imagesegmentation portion 142. The two microphones 51 a and 51 b are placedat positions different in the horizontal direction from each other.

[0121] The distance determination portion 52 computes the distance fromthe information tag 11 based on the intensity of ultrasonic wavesreceived by the microphones 51 a and 51 b. The time differencecomputation portion 53 computes the difference in detection time betweenultrasonic signals received by the microphones 51 a and 51 b. Thedirection determination portion 54 computes the direction of theinformation tag 11 with respect to the camera 50 (direction in thehorizontal plane including the microphones 51 a and 51 b) based on thedetection time difference computed by the time difference computationportion 53. Assume that the direction determination portion 54 holds thecorrespondence between the detection time difference and the direction.

[0122] The image coordinate determination portion 56 determines theposition of the information tag 11 in the horizontal direction in animage based on the direction of the information tag 11 obtained by thedirection determination portion 54 and the lens focal distance (degreeof zooming) at the imaging section 13 sent from a camera control section57. This processing is substantially the same as the processing ofassociating the position information of an information tag with positioncoordinates in an image, described in Embodiment 1.

[0123] The ID extraction portion 55 extracts ID information from theultrasonic signals received by the microphones 51 a and 51 b. Theattribute lookup section 15 reads the height of the person Pd as thedetection object from the attribute storage section 16 using the IDinformation. A template generation part 143 of the image segmentationportion 142 generates a shape template reflecting the size of the personPd in an image based on the distance obtained by the distancedetermination portion 52, the height of the person Pd obtained by theattribute lookup section 15 and the lens focal distance sent from thecamera control section 57. A template matching part 144 performsmatching using the template generated by the template generation part143 in an area of an image at and around the position of the informationtag 11, to detect the position of the person Pd. The positioninformation of the person Pd detected is given to the camera controlsection 57. Using the position information, the camera control section57 controls the imaging section 13 to perform more accurate focusadjustment, exposure adjustment, color correction, zoom adjustment andthe like.

[0124] A flow of the processing in this embodiment will be describedwith reference to the flowchart of FIG. 22, taking the situation shownin FIG. 20 as an example.

[0125] First, the information tag 11 carried by the person Pd as thesubject or the detection target transmits ID information via anultrasonic signal (T1). The microphones 51 a and 51 b of the camera 50receives the transmitted ultrasonic signal (T2). A difference in thereception time of the ultrasonic signal between the microphones 51 a and51 b is computed (T3), and the direction 0 of the information tag 11with respect to the camera 50 is computed from the reception timedifference (T4).

[0126] The distance D from the information tag 11 is computed from theintensity of the ultrasonic signal received by the microphones 51 a and51 b (T5). At this time, whether or not the position in the direction θfalls within the image taken is determined considering the zoommagnification at the imaging section 13 (T6). If determined that itfails to fall within the image (NO in T6), the zoom magnification islowered so that the position falls within the image, the orientation ofthe imaging section 13 is turned toward the direction θ (if the imagingsection 13 is mounted on a movable pan head), or a message is displayedon a monitor of the camera 50 notifying the user that the detectiontarget is outside the coverage or on which side of the image thedetection target exists, for example, to urge the user to turn theorientation of the camera 50 (T7). In this relation, recording may beautomatically halted as long as it is determined that the target isoutside the coverage, and may be automatically started once the targetfalls within the coverage.

[0127] If it is determined that the position in the direction θ fallswithin the image (YES in T6), the focus adjustment of the camera isperformed according to the distance D (T8). Assume that an image asshown in FIG. 23 is obtained by this adjustment. The position (area) L1of the information tag 11 in the image is then determined based on thezoom magnification and the direction θ (T9). Note however that when thedirection is computed with an ultrasonic signal, an error may occur dueto influence of the temperature, wind, reflection from a surroundingobject, noise and the like. Therefore, it is difficult to limit the areaso narrowly as to be able to specify a single person. In the exampleshown in FIG. 23, not only the person Pd to be detected but also aperson Pe is included in the area L1.

[0128] ID information is extracted from the ultrasonic signal (T10), andusing the ID information, the height H of the target is acquired (T11).A shape template T corresponding to the size of the target supposed tobe given on the image is generated based on the height H, the distance Dand the zoom magnification (T12). FIG. 24 shows an example of thetemplate T generated in this way. With the template T as shown in FIG.24, matching is performed for an area of the position L1 and itssurroundings in the image. The position giving the highest degree ofmatching is determined as an accurate position L2 (T13). In thismatching, only the person Pd is detected because the person Pe isdifferent in the size on the image from the person Pd. The position L2is displayed on the image, to allow the user to easily adjust theorientation of the camera and the like as required. Also, adjustment ofthe aperture and exposure, color correction, focus adjustment and thelike of the imaging section 13 are performed according to the color,brightness and quality of the area of the position L2 and itssurroundings (T14). In this way, even a user unfamiliar to photographingcan take images of the person Pd as the subject accurately.

[0129] The step T5 may be executed at any time after the step T2 beforethe distance D is used. The step T10 may be executed at any time fromthe step T2 before the step T11.

[0130] As described above, in this embodiment, persons similar in thesize on an image to each other (persons Pd and Pf) and persons close inposition to each other (persons Pd and Pe) can be distinguished fromeach other correctly. Accordingly, even in a situation including anumber of persons, the position of the target can be accurately detectedwithout largely increasing the processing amount. In addition, by usingan ultrasonic transmitter as the information tag, the direction anddistance can be advantageously computed with a simple and inexpensivesystem.

[0131] In the example described above, two microphones were used. Thenumber of microphones is not limited to this, but may be three or more.If three or more microphones are used, it is possible to compute thedirection of the information tag using a plurality of combinations ofany two microphones and average the computation results, to therebyimprove the precision of the direction computation.

[0132] The camera may trigger the information tag to transmit ultrasonicwave by use of ultrasonic wave, radio wave, light or the like. In thiscase, the time taken from the triggering until reception of anultrasonic signal may be measured, to compute the distance from theinformation tag based on the measured time and the sonic speed.

[0133] The detection processing described above may be performed onlywhen specific ID information is obtained. This enables detection of onlya target having a specific information tag in a situation of existenceof a plurality of information tags.

[0134] Both the information from the information tag and the imageprocessing may not necessarily be used at all times. For example, if asignal from the information tag is temporarily unreceivable due toinfluence of noise and the like, the reception state is determined bad,and the processing may be automatically switched to only detectionprocessing with images. In this case, a template used immediately beforethe switching may be used. On the contrary, If the detection with atemplate in an image temporarily fails due to influence of a change ofsunlight and the like, failure of this detection is determined, and theprocessing may be automatically switched to detection with only theinformation on the direction and distance of the information tag. Inthis way, if one of the two types of information is unusable, theprocessing is automatically switched to use of only the other type ofinformation. This may lower the detection precision but preventscomplete loss of sight of the target, and thus object detectionequipment durable against a change in situation can be attained.

[0135]FIG. 25 is a flowchart of a procedure of switching of theprocessing. Referring to FIG. 25, first, whether or not an ultrasonicsignal has been normally received by the microphones is determined (K1).If normally received (YES in K1), template matching is performed in anarea of an image at and around an estimated position of the informationtag determined in the processing described above (K2). Whether or notthe maximum of the degree of matching is equal to or more than apredetermined threshold is determined (K4). If it is equal to or morethan the threshold (YES), the position giving the maximum degree ofmatching is determined as the detected position of the person (K7). Ifit is less than the threshold (NO in K4), the position estimated fromthe ultrasonic signal is adopted as the detected position of the person(K6). Since this detection is low in reliability, a message notifyingthat the detection precision is low or that detection with an image isdifficult may be displayed on the monitor.

[0136] If normal reception of an ultrasonic signal fails in the step K1(NO in K1), template matching is performed for the entire image (K3).Alternatively, template matching may be performed in an area at andaround the detected position at the preceding frame. Whether or not themaximum of the degree of matching is equal to or more than apredetermined threshold is determined (K5). If it is equal to or morethan the threshold (YES), the position giving the maximum degree ofmatching is determined as the detected position of the person (K9). Inthis case, also, since the detection is low in reliability, a messagenotifying that the detection precision is low or that detection withultrasonic wave is difficult may be displayed on the monitor. If it isless than the threshold (NO in K5), failure of position detection isdetermined. This is displayed on the monitor, or the detection positiondetermined at the preceding frame is adopted (K8).

[0137] In the example described above, the detection procedure isswitched when one of the information types is unusable. The followingtechniques may also be adopted to change the detection procedure.

[0138] When there exist a plurality of types of ID information obtainedfrom information tags, for example, the range of an image within whichthe matching is performed may be made wider compared with the case thata single type of ID information is obtained. By this setting, occurrenceof miss detection can be suppressed even in the case that interferenceof ultrasonic signals occurs due to existence of a plurality oftransmitting sources and this degrades the precision of positiondetection with an ultrasonic signal.

[0139] When there exist a plurality of positions giving high degrees ofmatching, the setting may be changed to determine the position of aperson obtained from an information tag as the detected position. Thiscan suppress occurrence of erroneously detecting a wrong person whenvery similar persons exist close to each other and causing frequentdisplacement of the detection position.

[0140] Part or the entire of the processing performed by the objectdetection equipment of the present invention may be performed byexclusive equipment, or may be implemented as a processing programexecuted by a CPU incorporated in a computer. Alternatively, as in themonitoring center 31 shown in FIG. 5, the object detection server mayreceive an image taken by the imaging section and tag informationtransmitted from an information tag attached to a given target, anddetect the given target in the image.

[0141] As described above, according to the present invention, by usingan image and tag information transmitted from an information tagattached to a given target in integration, the target can be detectedaccurately in the image and the posture, motion and the like of thetarget can be specified, even in a place in which the lighting conditiongreatly changes, such as outdoors, and in a situation in which aplurality of persons and objects exist. Moreover, the processing amountcan be suppressed from increasing, and thus the processing time and costcan be significantly reduced compared with the case of performing onlyimage processing.

1. An object detection equipment comprising: an imaging section fortaking an image; a tag communication section for receiving taginformation transmitted from an information tag attached to a giventarget; and a target detection section for detecting the given target inthe image taken by the imaging section using the tag informationreceived by the tag communication section.
 2. The object detectionequipment of claim 1, wherein the tag information includes attributeinformation representing an attribute of the given target, and thetarget detection section performs the detection using the attributeinformation included in the tag information received by the tagcommunication section.
 3. The object detection equipment of claim 1,wherein the tag information includes ID information of the given target,the object detection equipment further comprises: an attribute storagesection for storing a correspondence between ID information andattribute information; and an attribute lookup section for looking upcontents of the attribute storage section using the ID informationincluded in the tag information received by the tag communication, toobtain attribute information of the given target, and the targetdetection section performs the detection using the attribute informationobtained by the attribute lookup section.
 4. The object detectionequipment of claim 2 or 3, wherein the target detection sectioncomprises: an image segmentation portion for determining a partial imagearea having the possibility of including the given target in the image;and an image recognition portion for detecting the given target in thepartial image area determined by the image segmentation portion, and atleast one of the image segmentation portion and the image recognitionportion performs processing by referring to the attribute information.5. The object detection equipment of claim 1, wherein the taginformation includes position information representing a position of theinformation tag, and the target detection section performs the detectionby referring to the position information included in the tag informationreceived by the tag communication section.
 6. The object detectionequipment of claim 1, wherein the tag communication section estimates aposition of the information tag from a state of reception of the taginformation, and the target detection section performs the detection byreferring to the position estimated by the tag communication section. 7.The object detection equipment of claim 1, wherein the tag informationincludes a detection procedure for the given target, and the targetdetection section performs the detection by executing the detectionprocedure included in the tag information received by the tagcommunication section.
 8. The object detection equipment of claim 1,wherein the target detection section performs the detection with onlyimage processing without use of the tag information when a receptionstate of the tag communication section is bad.
 9. An object detectionserver for receiving an image taken by an imaging section and taginformation transmitted from an information tag attached to a giventarget, and detecting the given target in the image using the taginformation.
 10. An object detection method comprising the steps of:receiving an image taken by an imaging section; receiving taginformation transmitted from an information tag attached to a giventarget; and detecting the given target in the image using the taginformation.